All Questions
2 questions
2votes
2answers
301views
Advantage computed the wrong way?
Here is the code written by Maxim Lapan. I am reading his book (Deep Reinforcement Learning Hands-on). I have seen a line in his code which is really weird. In the accumulation of the policy gradient $...
1vote
1answer
257views
Once the environments are vectorized, how do I have to gather immediate experiences for the agent?
My main purpose right now is to train an agent using the A2C algorithm to solve the Atari Breakout game. So far I have succeeded to create that code with a single agent and environment. To break the ...